Introduction to Data Analytics

Welcome to ANLY - 500

What is Analytics? - Possible Definition 1

What is Analytics? - Possible Definition 2

Scope of Analytics?

What is Descriptive Analytics? (1)

An Example of what to Expect in Descriptive Analytics: Ex.1.1

library(datasets) 
data("sunspot.month") # special way to load embedded data
head(sunspot.month)
#> [1] 58.0 62.6 70.0 55.7 85.0 83.5

An Example of what to Expect in Descriptive Analytics: Ex.1.1

str(sunspot.month)
#>  Time-Series [1:3177] from 1749 to 2014: 58 62.6 70 55.7 85 83.5 94.8 66.3 75.9 75.5 ...

An Example of what to Expect in Descriptive Analytics: Ex.1.1

summary(sunspot.month)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    0.00   15.70   42.00   51.96   76.40  253.80

An Example of what to Expect in Descriptive Analytics: Ex.1.1

library(ggplot2)
sunspot.month <- as.data.frame(sunspot.month)
sunspot.month$Time <- 1:nrow(sunspot.month)
ggplot(sunspot.month, aes(x = Time, y = x)) + 
  geom_point(alpha = 0.5) + 
  ylab("Number of Sunspots") + 
  xlab("Time") +
  theme_classic()

What is Predictive Analytics?

What is Predictive Analytics?

An Example of what to Expect in Predictive Analytics: Ex.2.1

library(quantmod)
start <- as.Date(Sys.Date()-(365*5))
end <- as.Date(Sys.Date()-2)
getSymbols("AMZN", src = "yahoo", from = start, to = end)
#> [1] "AMZN"
str(AMZN)
#> An 'xts' object on 2016-03-28/2021-03-24 containing:
#>   Data: num [1:1258, 1:6] 584 580 597 599 590 ...
#>  - attr(*, "dimnames")=List of 2
#>   ..$ : NULL
#>   ..$ : chr [1:6] "AMZN.Open" "AMZN.High" "AMZN.Low" "AMZN.Close" ...
#>   Indexed by objects of class: [Date] TZ: UTC
#>   xts Attributes:  
#> List of 2
#>  $ src    : chr "yahoo"
#>  $ updated: POSIXct[1:1], format: "2021-03-27 10:49:01"

An Example of what to Expect in Predictive Analytics: Ex.2.1

predictive_model <- lm(formula = AMZN.Close ~ AMZN.High + AMZN.Low + AMZN.Volume, 
                       data = AMZN[1:1199,])
summary(predictive_model)
#> 
#> Call:
#> lm(formula = AMZN.Close ~ AMZN.High + AMZN.Low + AMZN.Volume, 
#>     data = AMZN[1:1199, ])
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -96.983  -6.229  -0.346   5.907 102.325 
#> 
#> Coefficients:
#>                  Estimate    Std. Error t value
#> (Intercept) 0.00396240516 1.72427178518   0.002
#> AMZN.High   0.46690909373 0.02504102074  18.646
#> AMZN.Low    0.53414375322 0.02573170852  20.758
#> AMZN.Volume 0.00000006973 0.00000029698   0.235
#>                        Pr(>|t|)    
#> (Intercept)               0.998    
#> AMZN.High   <0.0000000000000002 ***
#> AMZN.Low    <0.0000000000000002 ***
#> AMZN.Volume               0.814    
#> ---
#> Signif. codes:  
#> 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 15.87 on 1195 degrees of freedom
#> Multiple R-squared:  0.9995, Adjusted R-squared:  0.9995 
#> F-statistic: 8.37e+05 on 3 and 1195 DF,  p-value: < 0.00000000000000022

An Example of what to Expect in Predictive Analytics: Ex.2.1

par(mfrow=c(2,3))
plot(predictive_model,1)
plot(predictive_model,2)
plot(predictive_model,3)
plot(predictive_model,4)
plot(predictive_model,5)

An Example of what to Expect Analytics: Ex.2.1

n <- length(AMZN[,1])
prediction <- stats::predict(predictive_model, AMZN[1200:n,])
tail(data.frame(prediction))
#>            prediction
#> 2021-03-17   3121.685
#> 2021-03-18   3071.226
#> 2021-03-19   3048.455
#> 2021-03-22   3094.542
#> 2021-03-23   3152.957
#> 2021-03-24   3123.701

An Example of what to Expect Analytics: Ex.2.1

plot(prediction, type = "l")

What is Prescriptive Analytics?

What does this translate into?

What is Data Analytics?

A Subcomponent of Data Analytics is Data Analysis!

A Subcomponent of Data Analytics is Data Analysis!

Other Types of Analysis

How to Correctly Apply Data Analytics?

Breaking Down the Research Process - The Initial Observation

Breaking Down the Research Process - The Initial Observation

Breaking Down the Research Process - The Initial Observation

Breaking Down the Research Process - Generating Theories

Breaking Down the Research Process - Creating a Hypothesis

Breaking Down the Research Process - Testing Theories & Hypotheses

Breaking Down the Research Process - Identifying the Variables

What’s After the Question & Identifying Variables?

What is Data?

Types of Measurements

Categorical Variables

Categorical Levels of Measurement - Binary

Categorical Levels of Measurement - Nominal

Categorical Levels of Measurement - Ordinal

Continuous Variables

Continuous Levels of Measurement - Interval

Continuous Levels of Measurement - Ratio

Consider Measurement Error:

How Valid Are My Measures?

Are My Measures Reliable?

Breaking Down the Research Process - Collecting the Data

Cross-Sectional Research

Longitudinal Research

Correlational Research

Experimental Research

Experimental Research - Methods

Experimental Research - Methods

Experimental Research - Methods

Breaking Down the Research Process - Methods to Collect the Data

Types of Variation in the Data to Consider:

Breaking Down the Research Process - Analyzing the Data

Population vs Sample

Fitting Models

Fitting Models

tapply(iris$Sepal.Length, iris$Species, mean)
#>     setosa versicolor  virginica 
#>      5.006      5.936      6.588

Statistical Modeling Parameters

Statistical Modeling Parameters

sample <- iris[sample(nrow(iris), 15), ]
tapply(sample$Sepal.Length, sample$Species, mean) #sample
#>     setosa versicolor  virginica 
#>   5.120000   5.925000   6.633333
tapply(iris$Sepal.Length, iris$Species, mean) #population
#>     setosa versicolor  virginica 
#>      5.006      5.936      6.588

Applicable Statistical Models

Summary